Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering RU alleles #49

Merged
merged 79 commits into from
Jan 6, 2023
Merged

Filtering RU alleles #49

merged 79 commits into from
Jan 6, 2023

Conversation

rnmitchell
Copy link
Contributor

This PR incorporates filtering using detection thresholds, analytical thresholds and stutter thresholds. This PR will only address the RU allele; LUSPlus allele filtering will be included in a later PR.

This PR also produces output files for use in either EuroForMix or STRmix.

@rnmitchell
Copy link
Contributor Author

@standage I've added additional functionalities since you've looked at this PR. I've added in the ability to create STRmix output files for NGS data (i.e. using the sequence strings from lusSTR) and perform filtering using detection thresholds, analytical thresholds and same size thresholds. Same size thresholds are when two sequences differ but are the same CE allele (sequence with highest number of reads are compared to other sequences, those with reads < threshold are dropped). Stutter filters are not yet implemented for sequence strings (much more complicated).

Comment on lines 25 to 33
def filters(data_order, locus, total_reads, datatype):
metadata = filter_marker_data[locus]
if len(data_order) == 1:
if thresholds('Detection', metadata, total_reads, data_order['Reads'][0])[1] is False:
data_order = pd.DataFrame()
elif thresholds('Analytical', metadata, total_reads, data_order['Reads'][0])[1] is False:
data_order[['allele_type', 'perc_noise']] = ['noise', 1.0]
elif thresholds('Analytical', metadata, total_reads, data_order['Reads'][0])[1] is True:
data_order['allele_type'] = 'real_allele'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is my biggest remaining source of concern. It's doing a lot of things, and I'm wondering if it can be decomposed into 2-4 smaller functions: one that handles len(data_order) == 1, one that handles datatype == "ce", and so on.

Also, data_order is the allele read count information, right? Another pass over this code to revisit variable names would probably improve legibility quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll take a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some refactoring and renaming variables to hopefully make things a little more clear. Let me know when you get a chance to look @standage and if it's up to your standard 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, just so we're clear, I don't think I code up to "my standard" very frequently. I hope this review has been helpful, or at least that it increases the likelihood that someone (possibly one of us) looking at this code 3 years will make sense of it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kidding! Hence the laughing face. The Standage standard is great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And much needed for my code 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is much clearer now. Thanks for humoring me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your help!!

@standage standage merged commit 1a93638 into master Jan 6, 2023
@standage standage deleted the filtering branch January 6, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants